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Abstract: We study the asymptotic behavior of piecewise constant least 
squares regression estimates, when the number of partitions of the estimate 
is penalized. We show that the estimator is consistent in the relevant metric 
if the signal is in L^{[0, 1]), the space of cddldg functions equipped with the 
Skorokhod metric or C([0, 1]) equipped with the supremum metric. Moreover, 
we consider the family of estimates under a varying smoothing parameter, also 
called scale space. We prove convergence of the empirical scale space towards 
its deterministic target. 



1. Introduction 

Initially, the use of piecewise constant functions for regression has been proposed by 
(iit . who called the corresponding reconstruction the regressogram. pB\ proposed 
it as a simple exploratory tool. For a given set of jump locations, the regressogram 
simply averages the data between two successive jumps. A difficult issue, however, 
is a proper selection of the location of jumps and its convergence analysis. 

Approximation by step functions is well examined in approximation theory (see 
e.g., [7|]), and there are several statistical estimation procedures which use locally 
constant reconstructions. fl3| studied the case where the signal is a step function 
with one jump and showed that in this case the signal can be estimated at the para- 
metric n~^/^-rate and that the jump location can be estimated at a rate of n^^ . 
This was generalized by [28] and [29] to step functions with a given a known upper 
bound for the number of jumps. The locally adaptive regression splines method by 
[lit ^"^d the taut string procedure by Q use locally constant estimates to recon- 
struct unknown regression functions, which belong to more general function classes. 
Both methods reduce the complexity of the reconstruction by minimizing the total 
variation of the estimator, which in turn leads to a small number of local extreme 
values. 
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In this work we choose a different approach and define the complexity of the 
reconstruction by the number of intervals where the reconstruction is constant, or 
equivalently by the number of jumps of the reconstruction. Compared to the total 
variation approach, this method obviously captures extreme plateaus more easily 
but is less robust to outliers. This might be of interest in applications where extreme 
plateaus are informative, like for example in mass spectroscopy. 

Throughout the following, we assume a regression model of the type 

(1) Yi = fi.n + S.t.n, {i=l,...,n), 

where (^i,n)i=i,...,n is a triangular array of independent zero-mean random variables 
and fi^n is the mean value of a square integrable function / e ^^([0, 1)) over the 
interval [{i — l)/n,i/n] (see e.g. [9|), 

(2) ft,n f{u)du. 

J{i-l)/n 

This model is well suited for physical applications, where observations of this type 
are quite common. 

We consider minimizers T^(y„) G argmin_ff-y(-, Yn) of the hard thresholding func- 
tional 



1 " 

(3) H^{u, r„) = 7 • #J{u) + - - Y^,nf , 

i=l 

where 

J(u) = {i : 1 < i < n — 1, Ui 7^ Wi+i} 

is the set of jumps oi u. In the following we will call the minimizers of |(3)| jump 
penalized least squares estimators or short Jplse. 

Clearly choosing 7 is equivalent to choosing a number of partitions of the Jplse. 
Figure [1] shows the Jplse for a sample dataset and different choices of the smoothing 
parameter 7. 

This paper complements work of the authors on convergence rates of the Jplse. [1] 
show that given a proper choice of the smoothing parameter 7 it is possible to obtain 
optimal rates for certain classes of approximation spaces under the assumption of 
subgaussian tails of the error distribution. As special cases the class of piecewise 
Holder continuous functions of order < a < 1 and the class of functions with 
bounded total variation are obtained. 

In this paper we show consistency of regressograms constructed by minimizing 
|(3)| for arbitrary functions and more general assumptions on the error. If the true 
function is cctdldg, we additionally show consistency in the Skorokhod topology. This 
is a substantially stronger statement than the L2 convergence and yields consistency 
of the whole graph of the estimator. 

In concrete applications the choice of the regularization parameter 7 > in |(3)[ 
which controls the degree of smoothness (which means just the number of jumps) 
of the estimate T-y(F„), is a delicate and important task. As in kernel regression 
[3, , a screening of the estimates over a larger region can be useful (see [16', 26] ) . 
Adapting a viewpoint from computer vision (see 15|), [1, Q and [l3| proposed to 



consider the family {T^(f))^^Q, denoted as scale space, as target of inference. This 
was justified in Q by the fact that the empirical scale space converges towards that 
of the actual density or regression function pointwisely and uniformly on compact 
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Fig 1. The Jplse for different values 0/7. The dots represent the noisy observations of some 
signal f represented by the grey line. The black line shows the estimator, with 7 chosen such that 
the reconstruction has four, six, eight and ten partitions, respectively. 



sets. The main motivation for analyzing the scale space is exploration of structures 
as peaks and valleys in regression and detection of modes in density estimation. 
Properties of the scale space in kernel smoothing are that structures like modes 
disappear monotonically for a shrinking resolution level and that the reconstruc- 
tion changes continuously with respect to the bandwidth. For the Jplse, the family 
(T^(/))^>o behaves quite differently. Notable distinctions are that jumps may not 
change monotonically and that there are only finitely many possible different esti- 
mates. To deal with these features, we consider convergence of the scale space in the 
space of cadldg functions equipped with the Skorokhod Ji topology. In this setting 
we deduce (under identifiability assumptions) convergence of the empirical scale 
space towards its deterministic target. Note that the computation of the empirical 
scale space is feasible. The family (Tly(y„yn^>o can be computed in O(n^) and the 
minimizer for one 7 in O(n^) steps (see [26l|). 

The paper is organized as follows. After introducing some notation in Section [21 
we provide in Section 13.11 the consistency results for general functions in the 
metric. In Section [3. 21 we present the results of convergence in the Skorokhod topol- 
ogy. Finally in Section 13.31 convergence results for the scale space are given. The 
proofs as well as a short introduction to the concept of epi-convergence, which is 
required in the main part of the proofs, are given in the Appendix. 

2. Model assumptions 

By ^([0, 1)) — span{l[5.f) : < s < t < 1} we will denote the space of step functions 
with a finite but arbitrary number of jumps and by D([0, 1)) the cddldg space of 
right continuous functions on [0,1] with left limits and left continuous at 1. Both 
will be considered as subspaces of L^([0, 1)) with the obvious identification of a 
function with its equivalence class, which is injective for these two spaces. More 
generally, by D{[0, 1),6) and D{[0,oo),Q) we will denote spaces of functions with 
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values in a metric space (Q, p), which are right continuous and have left limits. || • | 
will denote the norm of i^([0, 1)) and the norm on L°°{[0, 1)) is denoted by || • ||oo- 
Minimizers of the hard thresholding functionals | ( 3 ) | will be embedded into L^{[0, 
1)) by the map t" : M" i — > L^{[0, 1)), 

n 

. . . ,Un)) = Ui'i-\(i-l)/n,i/n)- 
i=l 



Under the regression model (1) this leads to estimates /„ — L"{T^^^{Yn)), i.e. 



/„ e i"(argminiJ^„(-,K„)). 

Note that, for a functional F we denote by argmini*" the whole set of minimizers. 
Here and in the following (7„)„gN is a (possibly random) sequence of smoothing 
parameters. We suppress the dependence of /„ on 7„ since this choice will be clear 
from the context. 

For the noise, we assume the following condition. 

(A) For all n £ N the random variables {^i,n)i<i<n o,i"e independent. Moreover, 
there exists a sequence (/3ri)neN with n"^ (3n — > such that 

4 max — ■ < Pn P-a.s., 

l<i<J<n j — i + 1 

for almost every n. 

The behavior of the process |(4)| is well known for certain classes of i.i.d. sub- 
gaussian random variables (see e.g. [l^]). If for example ^i^„ = £,i ^ N{0, cr^) for all 
i — 1, . . . ,n and all n, we can choose /3„ — 2a^ log n in Condition |(A)[ The next 
result shows that ] (A) [ is satisfied for a broad class of subgaussian random variables. 

Lemma 1. Assume the noise satisfies the following generalized subgaussian condi- 
tion 

(5) Ee'^^-" < e""^'"' , {for all u e R,n e E,l < i < n) 

with < C < 1 and a > 0. Then there exist a C > such that for /?„ = Cn'' \ogn 
Condition \(A)\ is satisfied. 

A more common moment condition is given by the following lemma. 

Lemma 2. Assume the noise satisfies 

(6) supE|^j,„p™ < oo, {for all n e N,l < i < n) 

i,n 

for m > 2. Then for all C > and /3„ — C(nlogn)^/™ Condition \(A)\ is satisfied. 
3. Consistency 

In order to extend the functional in |(3)| to L^([0, 1)), we define for 7 > 0, the 
functionals : L^{[0, 1)) x ^^([o^ 1)) , — , R y 00: 



i-#Ji9) + \\f-9\\\ geS{[0,l)), 

00, otherwise. 
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Here 



J{9)={te (0,1) ■.g{t~)^g{t+)} 



is the set of jumps of g e 5([0, 1)). For 7 = 0, we set H^{g, /) = ||/ — g\\ for all 
g G L^([0, 1)). The following lemma guarantees the existence of a minimizer. 

Lemma 3. For any f E i^([0, 1)) and all -f > we have 

argmini/°°(.,/) ^0. 



In the following we assume that K„ is determined through (1) the noise ^„ 
satisfies [(A) I and (/3n)nGN is a sequence with /3„/n — > such that (4) holds. 



3.1. Convergence in 

We start with investigating the asymptotic behavior of the Jplse when the sequence 
7„ converges to a constant 7 greater than zero. In this case we do not recover the 
original function in the limit, but a parsimonious representation at a certain scale 
of interest determined by 7. 

Theorem 1. Suppose that f £ i^([0, 1)) and 7 > are such that is a unique 
minimizer of H^{-,f). Then for any (random) sequence (7n)neN C (0,cxd) with 
^ 1 P-a.s., we have 

fn > A P-a.s. 

n — ^oc 

The next theorem states the consistency of the Jplse towards the true signal for 
7 = under some conditions on the sequence 7„. 

(H) (7„)„GN satisfies 7„ and ^nn/ Pn 00 P-a.s.. 

Theorem 2. Assume f e L^([0, 1)) and (7„)„eN satisfies \(H)\ Then 

fn > ./, P-a.s. 



3.2. Convergence in Skorokhod topology 

As we use cddlag functions for reconstructing the original signal, it is natural to 
ask, whether it is possible to obtain consistency in the Skorokhod topology. 

We remember the definition of the Skorokhod metric [l^. Section 5 and 6] . Let 
Aoo denote the set of all strictly increasing continuous functions A : M-|- 1 — > K+ 
which are onto. We define for /, g S -D([0, cxd), 8) 

p{f{\{t)^u),g{t)) 

where L(A) = sup^_^j>Q | log ilill^^i£l|. Similarly, Ai is the set of all strictly increas- 
ing continuous onto functions A : [0, 1] 1 — > [0, 1] with appropriate definition of L. 
Slightly abusing notation, we set for f,g£ D{[0, 1),Q), 

Ps(/,5)-inf(max(L(A), sup p{f{X{t)),g{t))):XeAA . 

I o<t<i J 

The topology induced by this metric is called Ji topology. After determining the 
metric we want to use, we find that in the situation of Theorem [1] we can establish 
consistency without further assumptions, whereas in the situation of Theorem [2] / 
has to belong to £'([0, 1)). 
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Theorem 3. (i) Under the assumptions of TheoremUl 

f ^([0.1)) f ro 

Jn > 77 r-a.s. 

n — >oo 

(ii) /// e D{[0,1)) and (7„)„gN satis fies l(H)\ then 

In > f r-a.s. 

71 — ^OO 

// / is continuous on [0,1], then 

fn > f r-a.s. 



3. 3. Convergence of the scale spaces 

As mentioned in the introduction, following [J] , we now want to study the scale space 
family (T'-y(/))^>o as target for inference. First we show that the map 7 i— > T-y(/) 
can be chosen piecewise constant with finitely many jumps. 

Lemma 4. Let f G i^([0, 1)). Then there exists a number m{f) € NU {c»} and a 
decreasing sequence (7m)™Ho C M U cxd such that 

(i) 7o = oo,7,„(^) = 0, 

(ii) for all 1 < i < ni{f) and 7', 7" G (7^,7^-1) we have that 

argminiJ^(-, /) = argmin i/^, ( • , /) , 

(iii) for all 1 < i < m{f) — 1 and 7^+1 < 7' < 7^ < 7" < ji-i we have: 

argmini7^°°(-, /) D argmini/^°?(-, /) U argmini/^°?,(-, /) , 

and 

(iv) /or all 7' > 71 

argminiJ^(.,/) = argminff^°?(., /) = {T„,(/)}. 

HereTocif) is defined by Too{f){x) = / f{u)dul[os){x)- 

Thus we may consider functions e Z?([0, 00), L^([0, 1))) with 

f„(C) e i"(argmini7i/^(-,r„)), 

for all C, > Q. We will call f„ the empirical scale space. Similarly, we define the 
deterministic scale space r for a given function /, such that 

(7) t(C) G argmini/i7^(., /)), (for all C > 0). 

The following theorem shows that the empirical scale space converges almost surely 
to the deterministic scale space. Table [T] and Figure [D demonstrate this in a finite 
setting for the blocks signal, introduced by [lo| . 

Theorem 4. Suppose f G i^([0, 1)) is such that # argminff^(-, /) = 1 for all but 
a countable number of j > and # argmin_ff^(-, /) < 2 for all 7 > 0. T/ien r is 
uniquely determined by\{7)\ Moreover, 



holds both in D{[0, 00), D{[0, 1))) and D{[0, 00), ^^([o, 1))). 
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Fig 2. Comparison of scale spaces. The "Blocks" data of flA l sampled at 64 points (dots) are 
compared with the different parts of the scale space derived both from the data (black) and the 
original signal (grey), starting with 7 = oo and lowering its value from left to right and top to 
bottom. Note that for the original sampling rate of 2048 the scale spaces are virtually identical. 
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Table 1 

Comparison of scale spaces. For the "Blocks" data of flAI sampled in 64 points with a signal to 
noise ratio of 7, the eleven largest ■y values (see Lemma\4\l for the deterministic signal (bottom) 
and the noisy signal ( top ) are compared. The last two values of the bottom row are equal to 
zero, since there are only nine ways to reconstruct the deterministic signal 

852 2T7 173 148 108 99l8 55^9 46^6 5^36 4^62 2^9 
885 249 159 142 100 99.1 80.2 41.3 38.9 



Fig 3. Scale spaces of a sample function (grey line). The black lines show all reconstructions of 
the sample function for varying 7. 



Discussion. The scale space of a penahzed estimator with hard thresholding 
type penalties generally does not have the same nice properties as its counterparts 
stemming from an I2- or Zi-type penalty. In our case the function value at some 
point of the reconstruction does not change continuously or monotonically in the 
smoothing parameter. Moreover, the set of jumps of a best reconstruction with k 
partitions is not necessarily contained in the set of jumps of a best reconstruction 
with k' partitions for k < k\ see Figure [3] This leads to increased computational 
costs, as greedy algorithms in general do not yield an optimal solution. Indeed, one 
needs only O(nlogn) steps to compute the estimate for a given 7 if the penalty is 
of h type as in locally adaptive regression splines by [l^l, compared to O(ri^) steps 
for the Jplse. 

We mention, that penali zing the number of jumps corresponds to an L'^-penalty 
and is a limiting case of the [20| functional, when the dimension of the signal (image) 



is d = 1 |27| , and results in "hard segmentation" of the data [24 1 



4. Proofs 



Some additional notation. Throughout this section, we shorten J{fn) to J„. 
We set 5'„([0,1)) = t"(M"), S„ = ct(5„([0, 1))). Observe that t"(/„) is just the 
conditional expectation Euq ^{f\Bn), denoting the uniform distribution on [0, 1) by 
Uo,i. Similarly, for any finite J C (0, 1) define Bj = cr({[a, 6) : a, 6 e J U {0, 1}}) 
and the partition Pj = {[a, 6) : a, 6 G J U {0, 1}, (a, 6) n J = 0}. For our proofs it 
is convenient to formulate all minimization procedures on i^([0, 1)). Therefore we 
introduce the following functional , : L^{[Q, 1)) x ^^([o, 1)) , — , defined 



7 
as 



l#Ji9) + \\f-9\\'-\\f\\\ if .9e5„([0,l)), 



00, otherwise, 
H^{9,f)^H°^{gJ)-\\fr. 
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Clearly for each /, has the same minimizers as H^, differing only by a constant. 
The following Lemma relates the minimizers of and H-y. 

Lemma 5. For all f e i^([0, 1)) and n e N we have u e argminiJ^(-, /„) if and 
only if l"-{u) € argmini?^(-, /). Similarly, u G argminiJ^(-, y) for y £ if and 
only if C^{u) e argmini/^(-, 

Proof. The second assertion follows from the fact that for u,y £ R" 

H^ii-{u),i-{y))=H,{u,y)-\\f\\\ 
Further, for m e M" we have (t"(/n) — /, /-"(/n) — '-"(")) = which gives 
H,{t"{u), f) = ^#J{u) + 11/ - .911' - 11/11" 

= 7#^H + Viln) - i"(w)f + 11/ - t"(/„)|r - II/IP 
= H^{u,fn) + const f.n 

what completes the proof. □ 

The minimizers g £ S{[0, 1)) of H-^i-, f) and H^{-,f) for 7 > are determined 
by their jump set J{g) through the formula g = Eun i(/|S,7(g)). In the sequel, we 
abbreviate 

M/(/)=^(/)-^ / f{u)du 



I 



to denote the mean of / on some interval /. In addition, we will use the abbreviation 
fj := Euo i{f\l3j), such that for any partition Pj of [0, 1) 



Further, we extend the noise in (1) to i"([0, 1)) by 



4-1. Technical tools 



We start by giving estimates on the behavior of (^„),/ — J^iePj l^ii^n)^i- 

Lemma 6. Assume {^i.n)neN.i<i<n satisfies \(A)\ Then P-almost surely for all ir. 
tervals I C [0, 1) and all n £ N 

l^li^nf < 



2 ^ 



ni{I) 



Proof. For intervals of the type [(« — l)/n, j /n) with i < j £ N the claim is a direct 
consequence of |(4)[ For general intervals, [{i+pi)/n, {j —p2)/n) withpi,p2 G [0, 1], 
we have to show that 

[Pl ■ ^i.n + S.t+l,n H 1- ^j-lAi +P2 ■ ^j.n)"^ " PniPl + P2 + i " « " 1) < 0. 

The left expression is convex over [0, 1]^ if it is considered as function in {pi,p2). 
Hence it attains its maximum in an extreme point of [0, 1]". □ 
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Lemma 7. There is a set of V -probability one on which for all sequences (J„)„gN 
of finite sets in (0, 1) the relation lim„^oo Pn^Jn/n = implies 



(?«)./„ > 0. 



Proof. By Lemma [6] we find 



(8) ll(en)./JP = E < — (#'^" + 1)' 

This immediately gives the assertion. □ 

Now we wish to show that the functionals epi-converge (see section 14. 4p . To this 
end we need two more results. 

Lemma 8. Let {Jn)neN be a sequence of closed subsets in (0, 1) which satisfies the 
relation lini„^oo /?n#-'"n/?i = 0. For {g„)neti C L'^{[0,1)) with ||5„-5|| > 0, 

n— »-oo 

where gn is Bj^^ measurable, we have almost surely 

11/ + - 9nf 11/ + e.||' > 11/ - 9\f - ll/f • 

n— *oo 

Proof. First observe that 

11/ + - 9nf - 11/ + ^nll' = \\9nf ^ 2{f , g,,) ~ 2(C„,.g„) 

-||g«ll'-2(./,.g„)-2((e„)j„,<?„). 
Since the sequence (|l.gri||)?ieN is bounded we can use Lemma[7]to deduce 



This completes the proof. □ 

Before stating the next result, we recall the definition of the Hausdorff metric 
PH on the space of closed subsets CL(8) of a compact metric space For 
9' C e 9 I? we set 

dist(i?,e') = inf{p(i?,i9') : d' e Q'}. 

Define 

{max{sup^£^dist(x,B),supj^gsdist(y, A)}, A,B ^0, 
1, A^B = 0, 

0, A = B = 0, 

With this metric, CL{Q) is again compact for compact Q [l^, see]. 
Lemma 9. The map 

m D) 3.,^ eNu(o,») 

is lower semi- continuous, meaning the set {g G 5'([0, 1)) : ffJ{g) < N} is closed for 
all N e NU {0}. 
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Proof. Suppose that \\gn — g\\ *■ with f^J^g^) < N < :^J[g). Using compact- 

n—*oo 

ness of the space of closed subsets CL([0, 1]) and turning possibly to a subsequence, 
we could arrange that J{gn) U {0, 1} > J U {0, 1} for some closed J C (0, 1), 

n — 'oc 

where convergence is understood in Hausdorff metric pn- Since the cardinality is 
lower semi-continuous with respect to the Hausdorff metric, J must be finite. We 
conclude for (s,t) n J = and e > that {s + e,t — e) C] J{gn) = eventually, 
i.e. gn is constant on (s + — e). Next we observe that gn'^(s+e,t-e) converges 
towards gl^g^^^t-e) (in ^^([Oi 1))) what implies that g is constant on (s + e, t — e). 
Since e > was arbitrary, we derive that g is constant on (s,t). Consequently, g is 
in S'([0, 1)) and J{g) C J. Using again lower semi-continuity of the cardinality in 
the space of compact subsets of [0, 1] shows that 

#J(g) > iV > limsup#J(g„) > lim inf #J(.g„) > # J > #J(g). 

n " 

This contradiction completes the proof. □ 

Now we can state the epi- convergence of H^^ as function on i^([0, 1)). 
Lemma 10. For all sequences (7n)neN satisfying \(H)\ we have 

n — 'OO ' 

almost surely. Here H^^, are considered as functionals on LF'{[0, 1)). 
Proof. We have to show that on a set with probability one we have 

(i) If 5„ > g then liminf„^oo H^^{9n, f + C«) > H^{g, /)■ 

(ii) For all g G -^^([0, 1)), there exists a sequence (^rJneN C ^^([0, 1)), gn > g 

n — >oo 

with limsup„^^ H^A9n, f + ^n) < H^{g, /)■ 

To this end, we fix the set where the assertions of Lemmas [7] and [5] hold simultane- 
ously. 

Ad l4.1l Without loss of generality, we may assume that H.y^ {gn, f+^n) converges 
in M U oo. If g„ ^ S'„([0, 1)) for infinitely many n or :^J{gn) > H^{g, f)/jn the 
relation [4. II is trivially fulfilled. Otherwise, we obtain 

lim sup ^#J{gn) < lim sup ^H^{g, /) = 0. 

n — 'OO ri n — *oo ri^ji 

Hence we can apply Lemma HI Together with Lemma IH] we obtain P-a.s. 
liminf i?^„(g„,/ + ^„) 

n — >oo 

> liminf 7„J(g„) + liminf(||/ + ^„ - .9n||^ - + 

n— >-oo n^oo 

>lJ{9) + i\\f~9\\''\\fr)^H^i9J)- 

AdOl If 5 ^ S{[0, 1)) and 7 > there is nothing to prove. If 7 = and stiU 
g ^ ^([0, 1)), choose g„ as a best L^-approximation of g in ^^([0, 1)) with at most 
I/Vt^ jumps. 

We claim that \\gn — g\\ ^ as n ^ 00. For that goal, let g„_fc denote a best 
approximation of 5 in {/ e 5'n([0, 1)) : #J{f) < k} and gk one in {/ e S{[0, 1)) : 
#^(/) < k}. 
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Moreover, for every n, k let C (0, 1) be a perturbation of J{gk), with n g N, 
#J^' = #J{gk) and pH{Jn, J{9k)) < l/^^- Denote g'^ ,, = .g^ o A„,fc where A„,fc G Ai 
fulfills Xn,k{Jn) = J{gk)- Siuce (a, 6) i-^- l[a,fc) is continuous in L'^{[0, 1)), we obtain 
readily Wg'n k ~ 9k\\ ^ 0- This implies for any fc € N 

limsup||g„ - .g|| < limsup ||5„,fc - gH < limsup ^ - ^jj = \\gk - 9\\ ■ 

n — ^oo n — 'OO n — >oo 

Since the right hand side can be made arbitrary small by choosing fc, (?„ converges 
to g. Then Lemma 151 vields HTT] 

If 7 > and g G S{[0, 1)), gn is chosen as a best approximation of g in S'„([0, 1)) 
with at most #J{g) jumps. Finally, in order to obtain [47T1 argue as before. □ 

To deduce consistency with the help of epi-convergence, one needs to show that 
the minimizers are contained in a compact set. The following lemma will be applied 
to this end. 

Lemma 11. Assume is a metric space. A subset A C D{[0,oo),Q) is rela- 

tively compact if the following two conditions hold 

(Bl) For all t £ M_|_ there is a compact Kt (- Q such that 

g{t)eKt, {for all ge A). 

(B2) For all T > and all s > there exists a d > such that for all g € A there 
is a step function g^ € S{[0, T), Q) such that 

sup{p{g{t), g,{t)) -.teiO, T)} < e and mpl(g,) > 5 , 

where mpl is the minimum distance between two jumps of f £ 5([0,T)) 

mpl(/) := min{|.s - t\ : s ^ t £ J{f) U {0, T}}. 

A subset A C D{[Q^ 1), Q) is relative compact if the following two conditions hold 

(CI) For all t G [0, 1] there is a compact Kt ^ Q such that 

g{t) e Kt {for all g £ A). 

(C2) For all e > there exists a 5 > such that for all g £ A there is a step 
function g,, € >5'([0, such that 

sup{p(5(t),ge(i)) : i e [0, 1]} < e and mjp\{g^) > 5 . 

Proof. We prove only the first assertion, as the proof of the second assertion can 
be carried out in the same manner. 

According to Theorem 6.3, it is enough to show that ] (B2)] implies 

lim supu'g((5, T) — 



5^0 



where 



Wg{5,T) = inl\ max sup ^(^(s), ^(t)) : . . . , C (0, T), 

to^O,t^=T,\t,-t,\ >6}. 
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So, fix r > 0, £ > and choose S from (B2) Then we set for g G A {to, . . . , = 
J{ge)U{0,T}. Clearly, mp\{g^) > S implies — tj| > S for all i ^ j. For neighboring 
ti^i,ti e J{ge) U {0, T} and s, i e [^i-i, ij) we derive 

p{g{s),g{t)) < p(g(s), + 5e W) + P(5e(t), 5W) < e + + e = 2e. 

This establishes the above condition and completes the proof. □ 

In the context of proving compactness we will also need the following result. 

Lemma 12. For any f e £^([0,1)) the set {/,/ : J C (0, 1),#J < oo} is 
relatively compact in i^([0, 1)). 

Proof. The proof is done in several steps. 
1. Since {s,t) i-^ l[s,t) is continuous, 




< 2, li C [0, 1) interval 



is the continuous image of a compact set and hence compact for all M e N and 
z > 0. 

2. If / = 1/ for some interval /, we obtain for any J C [0, 1) that /,/ is a linear 
combination of at most three different indicator functions. 

3. If / = 1 Q^il/i is a step function and J arbitrary then /,/ = X^jii l^j'^i' 
holds by 2. for some M' < 3M. Using 

Pj = < . max Jail 

as well as 1., we get that {fj : J C [0, 1)} is relatively compact for step functions 

/• 

4. Suppose / e i^([0, 1)) is arbitrary and £ > 0. We want to show that we 
can cover {/,/ : J C [0, 1)} by finitely many £-balls. Fix a step function g such 
that II/ — .g|| < £/2. By the Jensen Inequality for conditional expectations, we 
get ll/j — g.j\\ < e/2 for all finite J C [0,1). Further, by 3., there are finite sets 
Ji, . . . , Jp C [0,1) with p < CG such that mini=i_..._p \\gj — gj^ || < £/2 for all finite 
JC [0,1). This implies 

min ll/j - .9,7, II < niin ||.g,7 - g.,, \\ + \\f., - gj\\ < e 
and the proof is complete. □ 



4-2. Behavior of the partial sum process 

Proof of LemmaUi The following Markov inequality is standard for triangular ar- 
rays fulfilling condition I (A) [ |21|, Section III, §4, and all numbers /i^, i = 1, . . . , n: 

n 2 

P(l ^^^^^,n\ > z) < 2 cxp ( ^ ) (for all z e M). 

From this, we derive for 2^ > 12a that 

1P(IC.,« + • • • + e^^nl > z^j -i + lVn< logn) 

rteN l<i<j<n 

<2 2_^n^e 4° =2 2_^n 1° < 00. 

n6N n 
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Hence, for e > we have with probabihty one that 

{^i,n + • • • + Cj.n) \ /I r, I \ f 1 

max — — -4- > (12 + e)an^ losn 

l<i<j<n {j - t+1) ~ ^ ' ^ 

only finitely often. □ 
For the proof of Lemma [21 we need an auxiliary lemma. Denote by 

X>„ ^ {(i, j) : 1 < i < j < n such that i ^ k2\ j = {k + 1)2^ 

for some l,k £ {0, 1,2,...}} 

the set of all pairs which are endpoints of dyadic intervals contained in 

{1,. ..,«}. 



Lemma 13. Assume x G M" such that 

\xi - 

max -p ., 

{i,j)ev„ Vj - « + 1 



(9) max - — — < c 



for some c > 0. Then 



max \^i±==±M<{2 + V2)c. 



l<i<j<n y/j — i + I 



Proof. Without loss of generality we may assume that n = 2™ for some m e N 
(and add some zeros otherwise). First, we prove by induction on m that [(9)] implies 

(10) max + < (i + 72)e. 

For m — there is nothing to prove. Now assume that the statement is true for m. 
Let 2™ < j < 2"+!. Note that 



\xi + --- + x^\ ^V2"'^\xi + --- + X2r^\ , |a;2'"+i + --- + a;j| 



Apply the induction hypothesis to the second summand to obtain 
\xi + --- + xj\ (V2^+(1 + V2)Vj~2^) 

For 2™ + 1 < j < 2'"+^ the expression on the right hand side is maximal for 
j = 2™+i with maximum (1 + \/2)c. Hence the statement holds also for m + 1 and 
we have shown that |(9)| implies |(10)[ 

The claim is again proven by induction on m. For m = there is nothing to prove. 
Assume that the statement is true for m. If i < j < 2™ or 2™ < i < j < 2'"+^ the 
statement follows by application of the induction hypotheses to (xi, . . . ,X2>n) and 
{x2"^+i, . . . , X2m+i), respectively. Now suppose i < 2'" < j. Then 



\x^ + ■■■+XJ\ ^ V2"' \xi + ---+X2^\ 



1 VT^^ 1x2-^ + 1 + ■■■+ Xj\ 
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Application of |(10)| to x' = (a;2'" , a;2m-i, . . . ,xi) and x = {x2'"+i, ■ ■ ■ , X2'n+i) then 
gives 

^,+_+^<V2^Hm+^^ ^ 

- « + 1 " V j - * + 1 / - V 

Proof of Lemma\^ 8] show that for m > 1 and some constant Cm depending on 
m only 

V (i - i + 1)'" J - " j - i + 1 

The Markov inequality then yields for any z > and all 1 < i < j < n 

ICi.n + ■ • • +an| > \ ^ C„SUp,,„E|^,,„|2- 



Since there are at most 2n dyadic intervals contained in {1, . . . ,n}, we obtain by 
Lemma [T51 for any C > that 



lOfor any C > that 

neN l<i<j<r 



y y p/'^!:!L±_^±|L^>c(nlogn)i/'"') 
•■^^ -"^^ V V? — i + 1 / 



<y y Pf^^^i±=±^ > (2 + A/2)C(nlogn)i/"') 
neN(jj)ex'„ ^-^ 

^ CmSup,^„E|e,,„P"' 2n ^ 

" (2 + \/2)2'"C2™ n2 log2 n 

The claim follows by application of the Borel-Cantelli lemma. □ 
4-3. Consistency of the estimator 

The proofs in this section use the concept of epi-convergence. It is introduced in 
Appendix. 

Proof of Lemma\^ For 7 = there is nothing to prove. Assume 7 > and g G 
^([0,1)) with #J(g) > II/IIV7. This yields 

H^{QJ) = \\fr <H^{gJ). 

Moreover, observe that for g e S'([0, 1)) we have H^{g,f) > H^{fj(^g^J). Thus, 
it is enough to regard the set {fj : #J < II/IP/7}, which is relatively compact in 
£^([0, 1)) by Lemma \T% This proves the existence of a minimizer. □ 

Proof of Theorem [7] and Theorem O By the reformulation of the minimizers in 
Lemma [Sj Lemma [TU] and Theorem [S] (see Appendix) it is enough to prove that 
almost surely there is a compact set containing 

(J argmin7?^„ (•, / + Cn) • 

TiGN 

First note that all /„ £ argmini/^^ ('7 / + Cn) have the form (/ + ^„),/„ for some 
(random) sets J„. Comparing H^^^{fn, f + Cn) with iJ^^(0, / + ^„) — 0, we obtain 
the a priori estimate 

ln#Jn <\\{f + UjJ' < 211/lP + 2 ||(60./Jl' < 211/lP + ^(#J„ + 1) 

n 
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for all n e N. Since 7„ > eventually, we find P-a.s. 

211 f||2 I Mil 

Application of Lemma[7] gives lini„^oo(Cn)j„ = almost surely. Since by Lenima [T2l 
{/./„ : n e N} is relatively compact in L^([0, 1)), relative compactness of the set 
y^jgj^j argminiJ^^ (•, / + follows immediately. This completes the proofs. □ 



Proof of Theorem part (i) . Theorem [T] and Lemma |9] imply 

liminf#J„ >#J(/^). 

n — *oo 

Suppose limsup„^oo #J„ > H=J{f^) + 1. Let /^^„ be an approximation of f^ from 
5'„([0, 1)) with the same number of jumps as f-,. Then we could arrange f~f „ > 

fj such that lim„^oo H-t{f-t,m f + Cn) = H^{f^, /). Moreover, we know 

limsupiI^(/„, / + > 7 + H^Ui, 1)^1+ lim H^{f^,n, f + U 

n — 'oo ^ 

which contradicts that /„ is a minimizer of H-y(-,f + for all n. Therefore, 
#J„ = #J{f-i) eventually 

Next, chose by compactness a subsequence such that J„ U {0, 1} converges in 
Ph- Then, by Lcmma[9l the limit must be J(/-y) U {0, 1}. Consequently, the whole 
sequence {Jn)n&i converges to J{f-f) in the Hausdorff metric. 

Thus eventually, there is a 1-1 correspondence between Pj^ and Pj(f^) such that 
for each \s,t) e Pj{fy) there are [s„,t„) G Pj^ with 

Sn > s and i„ > t . 



By Lemma [6] and continuity of i— > l[s.t), wc find 

Construct A^^ E Ai linearly interpolating An(sn) = Then 

L(A„) > 1 

n — >oo 

as well as 

||/« - /o A„||oo = max + - >0 

which completes the proof. □ 



Proof of Theorem part (ii) The proof can be carried out in the same manner as 
the proof of Theorem 4, part (ii) in 0]. The only difference is, that it is necessary 
to attend the slightly different rates of the partial sum process |(4)[ □ 
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4- 4- Convergence of scale spaces 

Proof of Lemma^ It is clear, that each g € argmini7^(-, /) is determined by its 
jump set. Further, if 51,. g2 S 5'([0, 1)) with # J(gi) = #7(52) and ||/-.gi|| = ||/-g2|| 
then gi is a minimizer of H^{-, f) if and only if (72 is- 

Since H^{Q,f) = ||/||^ we have that 7 e [!^,oo) imphes J{g) < \\f\\^/j^, for a 
minimizer g of H^{-, /). Hence on [i^, 00) we have that 

miniJ^°°(.,/) =min{fc7 + Afc(/) : k < H/fM 

with Afc(/) defined by 

Afc(/) inf{||g - f\\ : g E S{[0, 1)), #J(.g) < k} . 

For each the map 7 1— > miniJ.^(-, /) is thus a minimum of a finite coUection of 
hnear functions with pairwise different slopes on [z^, 00). If there are different k,k' 
and 7 with k^ + hk = k'j + hk' it follows 7 = {hk' — hk)/{k~k'). From this it follows 
that there are only finitely many 7 where : kX + Afc(/) — mmH^{-, /)} > 
1. Further, argminiJ^(-, y) is completely determined by the k which realize this 
minimum. Call those 7, for which different k realize the minimum, changepoints 
of 7 1-^ minTJ^ (•,/). Since the above holds true for each 1/ > 0, there are only 
countably many changepoints in [0,cx)). This completes the proof. □ 

Proof of Theorem^ It is easy to see that the assumptions imply J(t) = {7m : m = 
1, . . . , m{x)} for the sequence {^m)m=a C MU 00 of LemmalU Since the scale space 
T is uniquely determined by its jump points, this proves the uniqueness claim. 
For the proof of the almost sure convergence, note that Theorem [T] and Theo- 



rem |3l part (i) show that fniC) — >n^oo ''"(C) if C is a point of continuity of r, i.e. 
# argmin77jy^(-, /) = 1. Convergence in all continuity points together with relative 
compactness of the sequence implies convergence in the Skorokhod topology. Hence, 
it is enough to show that {f„ : n G N} is relatively compact. 

To this end, we will use Lemma [TTJ In the proof of Theorem [1] it was shown, 
that the sequence {T^{Yn))neTi is relatively compact in L^{0, 1). To prove relative 



compactness in D{[Q,1)) we follow the lines of the proof of Theorem [31 part (i^ 
Similarly we find that 

limsup#J„ < max #>/(5) • 

n^oo geargmin_f/^^(-,/) 

For each subsequence of {T^{Yn))nef^j consider the subsequence of corresponding 
jump sets. By compactness of CL([0, 1]) we choose a converging sub-subsequence 
and argue as in the proof mentioned above that the corresponding minimizers con- 



verge to a limit in argminiJjy^(-, /). Thus we have verified condition (Bl) 



For the proof of | (B2)[ we will show by contradiction that for all T > we have 
inf{mpl(f„|[o,T]) : n e N} > 0. 



This, obviously, would imply (B2) Observe that fvi jumps in C only if there 
are two jump sets J 7^ J' such that Hifi^{{Yn)j,Yn) — Hi/c{{Yn)j' ,Yn) and 
Hi/c {iYn)j ,Yn) < i/i/c((K„)j",r„) for aU J". 

If [lB2)] is not fulfilled for (t„)„ gN, we can switch by compactness to a subse- 
quence and find sequences (C"^„)n6N, (C^„)neN with C^Xn ^ Ji^'n), Cn < Cn and 
C;^ > Cn * C for some C > 0. Choosing again a subsequence, we could 
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assume that the jump sets J^, J^, of minimizers /,j G t"(argmini?-y^(-, l^j)) for 
some sequences 7^ - l/C,l i 0, 7^ e (1/(2, and 7^ ~ 1/(2 q are constant 
and {fn)neN, k — 1,2, 3, converge. Further, we know from this choice of 7^ and 
Lemma H that #J,^ > #J2 > #J^. This imphes 



(11) 7^ + 7^ + - fnW <ll + \W\Y.,) - /2||2 < |1,"(K„) - /3| 



3 II 2 



The same arguments as in Theorem [T] and Theorem |3l part (i) respectively, yield 
{\imn->oa fn ^ ^ = 1; 2, 3} C argmin (• , /) . Since (11) holds for all n, the limits 



are pairwise different. This contradicts # argminiJ^^(-, x) < 2 and proves (B2) 

Thus {f„ : n G N} is relatively compact in D{[0,oo), D{[Q,1))) as well as in 
D{[0, oo), L^[0, 1]) and the proof is complete. □ 

Appendix: Epi-Convergence 

Instead of standard techniques from penalized maximum likelihood regression, we 
use the concept of epi-convergence (see for example [1, 113]). This allows for simple 
formulation and more structured proofs. The main arguments to derive consistency 
of estimates which are (approximate) minimizers for a sequence of functionals can 
briefly be summarized by 

epi-convergence + compactness + uniqueness a.s. strong consistency. 

We give here the definition of epi-(or F-) convergence together with the results from 
variational analysis which are relevant for the subsequent proofs. 

Definition 1. Let Fn : 8 i — > RUoo, n — l,...,oo be numerical functions on a 

metric space (6,p). (-F'n)neN epi-converges to Foq (Symbol F„ > Foo) if 

n — >QC 

(i) for all I? e 9, and sequences ('!9n)neN with -dn > i9 

TL — ^OO 

FooW < liminfF„(i?„) 

n — *-oo 

(ii) for all 1^ G B there exists a sequence (^n)neM with i9n > such that 
(12) Foo(t?) >limsupF„(i9„) 

n — ^oo 

The main, useful conclusions from epi-convergence are given by the following 
theorem. 

Theorem 5 ([l[. Theorem 5.3.6). Suppose Fn > Faa- 

n — >cxD 

(i) For any converging sequence (i?„)„gN; i^n G SLigmhiFn, it holds necessarily 
lim„^oo i^n e argminFoo. 

(ii) // there is a compact set K d Q such that ^ arg■mini^,l C K for large 
enough n then argmini^oo 7^ and 

dist(??„, argmin^Foo) >■ 

n — >oo 

for any sequence (i^n)neN, i^n S argminF„. 

(iii) //, additionally, argmini^oo is a singleton {d} then 

dn >d 

for any sequence (i^n)neN? ^ argminF^. 
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